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Abstract 


Given a system of multiple random variables, a new measure called the T-multivariate association 
coefficient is defined using (conditional) entropy. Unlike traditional correlation measures, the 
Amultivariate association coefficient measures the multiassociations or multirelations among the 
multiple variables in the given system; that is, the T-multivariate association coefficient measures 
the degree of the association for the given system. The Amultivariate association coefficient for 
the system of two random variables is also called the Abivariate association coefficient. The 
association measured by the Amultivariate association coefficient is a general type of association, 
not any specific type of a linear or nonlinear association. Unlike the A-dependence coefficient, 
which is an asymmetrical measure, the Amultivariate association coefficient is a symmetrical 
measure. A direct application of the L-multivariate association coefficient is in variables selection 
or variables reduction. This paper also explores the relationship between the Amultivariate 
association coefficient and the A-dependence coefficient. 

Key words: Entropy, conditional entropy, T-bivariate association coefficient, L -multivariate 
association coefficient, A-dependence coefficient 
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1 Introduction 


Entropy is a measure of uncertainty for a random system or a group of random variables. Let 
X = {X\. X‘ 2 ,..., X n ) T and Y = (Yi, Y 2 ,..., Y m ) T be two groups of discrete random variables 
with N and M possible states respectively denoted by {xi,... ,xj\ r} and {y\ ,..., ynt}- Joint 
probabilities of these possible states are denoted by p(xi, yj) (i = 1 ,... ,N;j = 1,..., M), while 
marginal probabilities are denoted by p(xi ) (i = 1,,.., N) and p(yj) (j = 1,, M). The following 
are the mathematical expressions for the entropy and conditional entropy (Shannon, 1948). 


Definition 1. 

Entropy in terms of a group of discrete random variables is defined as 

N 

H{X) = -^2p(xi)ln(p(xi)). (1) 

i— 1 

Consider that 0 < p(xi) < 1 for i = 1, 2,..., N, H(X) in (1) is greater than or equal to zero; 
that is, 


H(X) > 0. 


( 2 ) 


Definition 2. 

Conditional entropy of X given Y is defined by 

N M 

«-EE P(xi,yj)ia-{p(xi\yj)), (3) 

i=l j =1 

where p(xi,yj ) is the joint distribution of (x, y) and p(xq\yj) is the conditional probability of x t 
given y r 

Similar to (2), 


H(X\Y) > 0. 


(4) 
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Conditional entropy H(X\Y) measures the conditional uncertainty of X given Y. The 
conditional entropy H(X\Y) is less than or equal to the entropy H(X) with equality if and only if 
X and Y are independent which is, 


H(X\Y) < H(X). (5) 

Equality holds if and only if X and Y are independent. 


Definition 3. 


K { X:Y) = H ^~ H ^\ 
H{X) 


( 6 ) 


where 


H(X) is the entropy of X and 

H(X\Y) is the conditional entropy of X given Y. 

K(X : Y) in (6) is called the independence coefficient, or dependence coefficient, of X to Y, 
or X dependence on Y. 

The independence coefficient in (6) measures the degree to which X depends on Y with a 
range between zero and one. Following are the two fundamental theorems of the independence 
coefficient (Kong, 2007a, 2007b). 

Theorem 1 

K(X : Y) = 0 (7) 

if and only if X and Y are independent from each other. 
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Theorem 2 


Assume that p(xi) > 0 and p('<Jj) >0 (i = 1, 2,..., N; j = 1,2,..., M ). Then 

K(X :Y) = 1 

if and only if for each y-j, 3 Xi j such that p(X = Xj. \Y = y.j) = 1 and p(X = Xk\ Y = yj) = 0, 
where k ^ ij- 

Theorems 1 and 2 state that K(X : Y) = 0 is equivalent to the independence between X and 
Y while K(X :Y) = 1 is equivalent to A’s total dependence on Y. 

L-Bivariate Association Coefficient 

A measure frequently used for the relation between two random variables is the linear 
correlation coefficient. Because the linear correlation coefficient is defined from the first and second 
moments of the (joint) distribution of the random variables, the linear correlation coefficient 
only partially utilizes the information of the random variables to measure the relation between 
them. In this section, a new measure called the L-bivariate association coefficient is defined from 
(conditional) entropy to measure the relation between two groups of the random variables. The 
basic idea behind the L-bivariate association coefficient is to fully utilize the information of the 
random variables to measure the relation among them. Therefore, the L-bivariate association 
coefficient measures both linear and nonlinear relationships between two groups of random variables. 


Definition 4. 


L(X,Y) = 


H(X) - H(X\Y) 


( 8 ) 


H(X,Y) 

L(X, Y) in (8) is called the L-bivariate association coefficient for the random variables X and Y 


First, the L -bivariate association coefficient is symmetrical, which is L(X,Y ) = L(Y,X). 
This is directly from 


H(X) - H(X\Y ) = H(Y) - H(Y\X). 
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Second, the F-bivariate association coefficient ranges between zero and one when H(X, Y ) > 0. 


Theorem 3 

0 < L(X, Y) < 1. (9) 

Proof: First, according to (2) and (5), there are 

H(X,Y)> 0, 

H(X) - H(X\Y) > 0. 

From Definition 4, 

L(X,Y) > 0. 

Also, there is 

H(X) - H{X\Y) < H(X) < H(X, Y). 


According to Definition 4, this implies 

L(X,Y ) < 1. 


Therefore, Theorem 3 is proved. 


Theorem 4 


L(X, Y) = 0 if and only if X and Y are independent. 

Proof: According to Definition 4, 


( 10 ) 


L(X, Y) = 0 


if and only if 


H(X) - H(X\Y) = 0. 


( 11 ) 
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Also from (5), the equation in (11) holds if and only if X and Y are independent. Thus, 
Theorem 4 is proved. 

Theorem 5 

L(X,Y) = 1 (12) 

if and only if 

K(X :Y) = 1 and K(Y : X) = 1. (13) 

Proof: First, assume L(X,Y) = 1. From Definition 4, 

H(X,Y) = H(X) - H(X\Y). 

Therefore, 

H(X, Y) = H(X) - H(X\Y) < H(X) < H(X,Y), 

and consequently, 

H(X) = H(X,Y ) = H(X) - H(X\Y), 

which implies 

H(X\Y) = 0. 

Similarly, the following can be obtained: 

H{Y\X) = 0. 

From Definition 3, 

I<(X : Y) = 1, 

K(Y : X) = 1. 

Second, assume (13). From Definition 3, 

H(X\Y) = 0, 

H(Y\X) = 0. 
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Therefore. 


H{X, Y) = H{X) + H(Y\X) = H(X). 


By Definition 4, 

L(X,Y) = 1. 

Theorem 5 has proved that L(X, Y) = 1 if and only if X totally depends on Y (K(X : Y) = 1) 
and Y totally depends on X ( K(Y : X) = 1). Theorems 4 and 5 show the two extreme cases for 
the T-bivariate association coefficient. For general cases, the L-bivariate association coefficient 
should be between zero and one. 

One application of the T-bivariate association coefficient is in variables selection. Consider a 
system of n random variables denoted by 


S n = {Xi,...,X n }, 


and a subsystem of S n denoted by 


S(h,...,i k ) = {X h ,...,X ik }, where {n,..., i k } C {1,..., n}. 
From Definition 4, 


L((S n ),(S(*i, 


Therefore, 


. = H(S n )-H(S n \S(h,...,i k )) 

lk))) H(S n ,S(h,...,i k )) 

_ H(S n ) + H(S(h, ..., 4)) - H(S n , S(h, ..., 4)) 

H(S n ) 

_ H(S n ) + H(S(h,...,i k ))~H(S n ) 

H(S n ) 

_ H(S( 4,... ,4)) 

H(S n ) 

_ H(X h ,...,X ik ) 

~ H(X!,...,X n ) ’ 
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4((5 n ),(5(4,-..,4))) 


(14) 


H(X n ,...,X lk ) 

H(X u ...,X n ) ' 

Given a level of a : 1 > a > 0, say a=0.99, select a subsystem S(ii,..., 4) such that 


L(S n ,S(ii ,.. .,4)) > a. 

Under the given level a, the selected subsystem 5(4 ,... ,4) carries at least a% of the 
information of the S n , but the number of variables in 5(4, • • • ,4) is less than n in general. If a 
= 1, then all of the variables will be selected and no variable reduction takes place unless some 
variables totally depend on others. 

Actually, (14) can also be obtained in a similar way from the A-dependence coefficient in 
Definition 3. In the following application, only the 4-bivariate association coefficient can be 
applied. Assume S\ and S 2 are the two subsystems of S n . Given a level of a \ 1 > a > 0, say a = 
0.99, if 


A(5i,5 2 )>a. (15) 

The association between S\ and 5 2 is very great, and therefore, the S\ and 5 2 are redundant 
(one of these subsystems can be removed from the system). 

Although the 4-bivariate association coefficient is defined to measure the degree of the 
association for the system of two (groups of) random variables, it can also measure the degree of 
the association for a system of multiple (groups of) random variables. First, examine the system 
of three random variables denoted by 


S 3 = {X l ,X 2 ,X 3 }. 
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Theorem 6 


L((X i,X 2 ),X 3 ) = 0 and L(X i,X 2 ) = 0 


if and only if 


p(x h ,x i2 ,xi 3 ) =p(x h ) xp(x i2 ) x P(x i3 ), 

where p{xi t ,Xi 3 ,Xi 3 ) (i± = 1 ,..., mi, i 2 = 1 ,..., m 2,13 = 1 ,..., m 3 ) are the joint probabilities 
of the random variables X 1 , X 2 ■< and X 3 . p{xt) are the probabilities of the random variable X,; 
(* = 1,2,3). 

Proof: First, from Theorem 4, 

L((X 1 ,X 2 ),X 3 ) = 0 

if and only if (X\, X 2 ) and X 3 are independent: 


p(x h , x i 2 ,x i3 )= p{x n ,x i2 )x p(x i3 ). 


Second, 


L{Xi,X 2 ) = 0 


if and only if X\ and X 2 are independent: 


p(x h ,x i2 ) =p(x h ) xp(x i2 ). 


It is obvious that 


P(®n,^ 2 ,^ 3 ) =P(xh) x P(x i2 ) xp{x i3 ) 


if and only if 



p(xi t , Xi 2 ,Xi 3 ) = p(xi t ,Xi 2 ) x p(xi 3 ), and 


p(x h ,x i2 ) =p(x h ) xp(x i2 ). 

Therefore, Theorem 6 is proved. 

Theorem 6 shows that, for a given system of three random variables, if X% is independent 
from the system X 2 } and X\ is independent from X 2 , then this given system is in a (jointly) 
independent state. Similarly, the following two propositions are also true: 


L((X U X 3 ),X 2 ) = 0 and L(X 1} X 3 ) = 0 


if and only if 


p(x h ,x i 2 ,xi 3 ) =p(x h ) xp(4) xp(x i3 ), 


and 


L((X 2 ,X 3 ),AT) = 0 and L(X 2 ,X 3 ) = 0 


if and only if 


p(x h ,xi 2 ,x i3 ) =p(x h ) xp(4) xp(4). 

Theorem 6 measures the state of independence for a system. The following theorem measures 
a state of perfect association for the same system. 
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Theorem 7(a) 


L((X 1 ,X 2 ),X 3 ) = 1 and L{ X U X 2 ) = 1 

if and only if 

K(Xi : Xj) = 1 (i / j and i,j = 1,2,3). 

Proof: First, assume 

L(X u X 2 ) = 1 and L((X U X 2 ),X 3 ) = 1 . 

According to Theorem 5, 

K(Xi : X 2 ) = 1 and K(X 2 : Xi) = 1 

and 

K(X 3 : X U X 2 ) = 1 and K(X \,X 2 : X :i ) = 1. 

By Definition 3, 

H(X 3 \X lt X 2 ) = 0 and H(X i,X 2 \X 3 ) = 0. 

Also, 

0 < H(Xi\X 3 ) < H(X u X 2 \X 3 ) = 0, 

0 < H(X 2 \X 3 ) < H(Xi,X 2 \X 3 ) = 0. 

Therefore, 

H(X i|A 3 ) = 0 and H(X 2 \X 3 ) = 0. (16) 

On the other hand, 

H(X i,X 2 ,X 3 ) 

= H(X u X 2 ) + H(X 3 \X 1 ,X 2 ) 

= H(X 2 ) + H(X 1 \X 2 ) = H(X 2 ). 
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Therefore. 


H(X l ,X 2 ,X 3 ) = H(X 2 ). 


(17) 


Also, there is 


H(X u X 2 ,X 3 ) 

= H(X 2 ,X 3 ) + H(X i\X 2 ,X 3 ) 
= H(X 2 ) + H(X 3 \X 2 ). 


The above equation is directly from 


0 < H{X r \X 2 ,X 3 ) < H(X i|X 2 ) = 0. 


This implies 


H(X 1 \X 2 ,X 3 )=0. 


Therefore, 


H(X i,X 2 ,X 3 ) = H(X 2 ) + H(X 3 \X 2 ). 


(18) 


From (17) and (18), 


H(X 2 ) = H(X 2 ) + H(X 3 \X 2 ). 


Therefore, 


H(X 3 \X 2 ) = 0. 


(19) 


Similarly, 


H(X 3 \X{) = 0. 


(20) 


By (16), (19), and (20), 


K(X 1 : X 3 ) = 1 and I<{X 3 : X{) = 1, 
K(X 2 : X 3 ) = 1 and K(X 3 : X 2 ) = 1. 


11 



This is the proof of necessity. 


Second, assume 


K(Xi : Xj) = 1 (i^j and i,j = 1,2,3). 


From Theorem 5, 


L(Xi,X 2 ) = I- 


( 21 ) 


Also, by Definition 3, 


H(Xi\Xj) = 0 (i ± j and i,j = 1,2,3). 


There is 

0 < H(X 3 \x u x 2 ) < H(X 3 |Ai) = 0. 

Therefore, 

H(X 3 \X u X 2 )=0. 

By Definition 3, 

K(X 3 :(X 1 ,X 2 )) = 1. (22) 

Similarly, 


0<H(X 1 X 2 \X 3 )<H(X 1 \X 3 ) + H(X 2 \X 3 ) = 0 + 0 = 0. 


Therefore, 


H(X u X 2 \X 3 ) =0. 


According to Definition 3, 


K((Xi,X 2 ) : X 3 ) = 1. 


(23) 


From (22) and (23), 


L((X 1 ,X 2 ),X 3 ) = 1. 


(24) 
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The proof of sufficiency is (21) and (24). 


From Theorem 5, Theorem 7(a) can be rewritten as Theorem 7(b). 


Theorem 7(b) 

L((X 1 ,X 2 ),X 3 ) = 1 and L(X U X 2 ) = l 

if and only if 

L(X 1 ,X 2 ) = 1 L(X 1 ,X 3 ) = 1 L(X 2 ,X 3 ) = 1. 

Proof: This is obvious from Theorem 5. 


Similar to Theorem 6, the following compositions are also true: 


L((Xi,X 3 ),X 2 ) = 1 and L(X i,X 3 ) = 1, 


if and only if 


L(X 1 ,X 2 ) = 1 L(X 1 ,X 3 ) = 1 L(X 2 ,X 3 ) = 1, 


and 


L((X 2 ,X 3 ),Xi) = 1 and L(X 2l X 3 ) = 1, 

if and only if 

L(X i,X 2 ) = l L(X U X 3 ) = 1 L(X 2 ,X 3 ) = 1. 

Theorems 6 and 7(a) or 7(b) measure the statistical association of three random variables. 
The idea in Theorems 6 and 7(a) or 7(b) can be extended to the case of multiple variables. The 
following theorems measure the statistical association for a system of n random variables. 
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Theorem 8 


L(X 1 ,(X 2 ,...,X n )) = 0 
L(X 2 ,(X 3 ,...,X n )) = 0 


L{X n _ i,X n ) — 0, 

if and only if 

p(x u ...:,.x in ) = p(x h ) x ••• xp{x in ) 

where p(xi x ,..., Xi n ) (n = 1 ,..., m\,... ,i n = 1 ,..., m n ) are the joint probabilities of the random 
variables X\,, X n . p(xt) (i = 1,..., mf) are the probabilities of the random variable X{. 

Proof: Proof is similar to the proof in Theorem 6. 


Theorem 9(a) 

L(X 1 ,(X 2 ,...,X n )) = 1, 
L(X 2 ,(X 3 ,...,Xn)) = h 


L(X„_!,X n ) = 1 

if and only if 

K(Xi : Xj) = 1 (i / j and i,j = 1,2 ,... ,n). 

Proof: Proof is similar to the proof in Theorem 7(a). 

Theorem 9(a) can also be rewritten in terms of the L-bivariate association coefficient. 
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Theorem 9(b) 

L(X 1 ,(X 2 ,...,Xn)) = 1, 
L(X 2 ,(X 3 ,...,X n )) = 1, 


L(X n _!,X n ) = 1 

i/ and only if 

L(Xi,Xj ) = 1 (i <j and i, j = 1,2, 

Proof: This is obvious from Theorem 5. 

In this section, the L-bivariate association coefficient is defined that measures the association 
among two (groups of) random variables. In order to measure the association among three 
random variables, two L-bivariate association coefficients are needed. In general, n-1 h-bivariate 
association coefficients are needed to measure the association among n random variables. As 
shown, many different combinations of the L-bivariate association coefficients can measure the 
n (> 3) random variables in which each combination of the L-bivariate association coefficients 
measures a different aspect of the given system of n random variables. When the systems of 
multiple random variables are compared, these multivalued measures may cause confusion. It is 
obvious that the L-bivariate association coefficient needs to be improved upon or extended to 
measure the association for the given system of the multiple random variables. In the next section, 
a new concept called the L-multivariate association coefficient is defined that measures the degree 
of the association for a system of multiple random variables. 

2 L-Multivariate Association Coefficient 

The central issue in multivariate statistics is to study the relation among random variables. 
Traditionally the relation among the random variables means the relation between two random 
variables or two groups of random variables. In the real world, the relation not only exists between 
two objects, but it also exists for multiple objects. Sometimes people need to measure the relation 
in a system of multiple objects, which is called the association among the multiple variables. One 
example of this sort of system is the solar system, which has ten big objects - one star and nine 
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planets. In this section, a new concept called the L-multivariate association coefficient is defined 
that measures the association among the multiple random variables, and then some fundamental 
properties of the L-multivariate association coefficient are explored. 

Definition 5. 


L(X u ...,X n ) = 


H(Xi) + • • • + H(X n ) ~ H(X u ...,X n 

,Xn) 


(n > 2) 


(25) 


(n — l)H(Xi, 

The T-multivariate association coefficient for the system of the random variables X \,..., X n 
is represented by (25). 

It is obvious that, in case of n = 2, the L -multivariate association coefficient is 
actually the L -bivariate association coefficient. Also, from Definition 5, the L -multivariate asso¬ 
ciation coefficient does not depend on the order of the sequence of the random variables X \,..., X n : 


where 


L(X 1 ,...,X n ) = L(X il ,...,X i J 


(*i,..., i n ) is a permutation of (1,2,..., n). 


Similar to the L-bivariate association coefficient, the L -multivariate association coefficient 
ranges between zero and one. 

Theorem 10 

0 < L(X\, ..., X n ) < 1. (26) 

Proof: First, there are 


H(Xi ,..., X n ) < H(Xi) + • • • + H(X n ), 


and 


H(Xi ,..., X n ) > 0. 
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By Definition 5, 


L{X u ..., X n ) 


HjXj) + • • • + H{X n ) - H(X U .. .,X n ) 
(n — ..., X n ) 


On the other hand, for any i £ {1,..., n}, 


H(Xi) < H(Xi ,..., X n ). 


There is 


Therefore, 


Y J H(X i )<nxH(X 1 ,...,X n ). 

1=1 


Y, H{Xi) - H{X U ..., X n ) < (n - 1) x H(X U X n ). 

i=l 

From Definition 5, 


L(Xi ,..., X n ) 


YU H(Xj)~H(X 1 ,...,X n ) 
(n-l)H(X 1 ,...,X n ) 


Theorem 10 shows that the T-multivariate association coefficient is a normalized measure. 
This makes it possible for to compare the association among multiple random variables in the 
different systems. 


Theorem 11 

L(X u ...,X n ) = 0 (27) 

(28) 

if and only if 

P(xi i, • • •, x in ) = p(x h ) x ■ • • x p(x in ) (29) 
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where p(xi x Xi n ) {i\ = 1 ,..., m\,i n = 1,..., m n ) are the joint probabilities of the random 
variables X\,, X n and p(x t ) (i = 1 ,,im) are the probabilities of the random variable Xj. 

Proof: 


L(X 1 ,...,X n ) = 0, 


H{X{) + • • • + H(X n ) - H(X^ ...,X n ) = 0, 


Y^ (- -* \ 1 ■ ■ ■ l^in) n 

2_^ ••-2^ •••,*»„) in——-rww =0- 


u 


P(*ii) 


(30) 


By Kullback-Leibler divergence theory (Kullback, 1959), (30) holds if and only if 


This is the proof of Theorem 11. 


Theorem 11 shows that L(X i, ..., X n ) = 0 if and only if X \, . . ., X n are (jointly) independent. 
Therefore, L(Xi,..., X n ) = 0 is equivalent to the independent state of the system of random 
variables X\,, X n . 


Theorem 12(a) 

L(X 1 ,...,X n ) = 1, 

if and only if 

K(Xi : Xj) = 1 (i^j and b J = 1,2,... ,n). 

Proof: First, assume 

L(X u ...,X n ) = 1. 


(31) 

(32) 


(33) 
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Y / H(X i ) = nxH(X 1 ,...,X n ). 

i =1 


J2(H(X l ,...,X n )-H(X i )) = 0. 

i =1 

Also, for any i = 1,..., n, there is 

H(X 1 ,...,X n )-H(X i )> 0. 

By (34), there is 

H(X 1 ,...,X n ) = H(X i ). 


Therefore, for any i ^ j, 


H{Xj) < H(Xi, Xj) < H(X\, ...,X n ) = H(Xj). 


H{Xj) = H(Xi,Xj). 


H(Xi\Xj) = 0. 


K{Xi : Xj) = 1. 

Second, assume 

K(Xi : Xj) = 1 for any i / j. 


H(Xi\Xj) = 0 for any i / j. 


(34) 
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Therefore, for any i = 1, ... ,n, 


H( X 1 ,...,X i - 1 ,X i+1 ,...,X n \X i ) 

< H(X!\Xi) + • • • + HiXi-^Xi) + H(X i+1 \Xi ) + • • • + H{X n \Xi) 
= 0 +-1-0 + 0 +-b 0 = 0. 


This implies 


H(X r, ..., X,_i, X i+1 ,..., X n \Xi) = 0. 


H{X u..., X n ) 

= H(X,) + H(X U X i+1 ,..., X n \Xi) 


= H{Xi). 


This is 

H(X 1 ,...,X n ) = H(X i ). 


H(Xi) + • • • + H(X n ) = n x H{X U ..., X n ). 


Therefore, 


L{XX n ) 


H(x i) + • • • + H(xn) - H(x 1 , ...,X n ) 
(n — 1)H(X \,..., X n ) 


This is the proof of Theorem 12(a). 


Theorem 12(a) shows that L(X i, ..., X n ) = 1 if and only if the random variables X\ ...., X n 
totally depend on each other. In the case of L(X i,..., X n ) = 1, any variable in X\, ..., X n totally 
determines all of the other variables, or all of the other variables totally depend on this variable. 
In terms of information, the random variables X \,..., X n can be thought to be a single random 
variable in the case of L(X i,..., X n ) = 1. One special case is L(X, .. . , X) = 1, which is from 
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L(X,...,X) 

_ H(X) + --- + H(X)-H(X,...,X ) 
(n- 1) x H(X,...,X) 
n x H(X) - H(X) _ 

(n - 1) x H(X) 


Theorem 12(a) can also be expressed in terms of the L-bivariate association coefficient. 


Theorem 12(b) 


L(XX n ) = 1, 


(35) 

(36) 


if and only if 


L(Xi,Xj) = 1 (i < j and i,j = 1,2,... ,n). (37) 

Proof: This is obvious from Theorem 5. 

Now, consider a system of n random variables, which is denoted by 

S={X i,...,x n }. (38) 

The system S is divided into two subsystems, which are 

5 1 = {X il ,...,X ini }CS i 

5 2 = {4,...,4 2 }CS 

where 

ni + ri 2 = n, 

5i U S 2 = 5, 

5i n 5 2 = 4 >. 
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Theorem 13(a) 


L((Si),(S 2 ))=0 L(S\) = 0 L(S 2 ) = 0, 


(39) 


if and only if 


p(x i, ...,x n ) = p(x i) x • • • x p(x„). 


(40) 


Proof: First, assume (39) holds. From Theorem 4, 


L((S 1 ),(S 2 )) = 0 


p(fl,.., 

■,x n ) 


p(x h ,. 

■ ■ ! '^in 1 • 

, Xjj _, • • • , Xj n2 

p(xh,.. 

■, Xim) 

x p(x h 


) 




)■ 


According to Theorem 11, there are 


L(Si) = 0. 


And, 


= P(*ii) X ••• xp(4j)' 


T(5 2 ) = 0. 


p(xj 1 ,..,,x jn2 ) 

= p(Xji) X ••• x P(x jn2 ). 

Therefore, 

p(x i, ...,x n ) 

= p{x H x ini ) x P(x h ,..., x jn2 ) 

= P(*ii) x • • • x P(® ini ) x p(%) x • • • x p(xj„ a ) 
= p(xi) x • • • x p(x n ). 
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Second, assume 


p(x i, ...,x n )= p(x i) x • • • x p(x n ) 

P(x 1 , ...,x n )= p(x h ,...,x ini )x p{x h ,..., x jn2 ) 
p(Xji,...,X ini ) = P04) X ••• X P(f ini ) 
p(*ii»■ • •») = p (x h ) X • • • X p(x jn2 ). 

L((Si),(S 2 )) = 0 , 

L(S{) = 0 , 

L(S 2 ) = 0. 

This is the proof of Theorem 13(a). 

Theorem 13(a) can be rewritten in terms of the T-multivariate association coefficient. 

Theorem 13(b) 

L((5i),(5 2 )) =0 L(S{) = 0 L(S 2 ) = 0, (41) 

if and only if 

L(S) = 0 . (42) 

Proof: This is obvious from Theorem 11. 


Theorem 14(a) 


L((Si),(S 2 )) = 1 L(S i) = l L(S 2 ) = 1 

if and only if 

K(Xi : Xj) = 1 (i 7 ^ j and i,j = 1, 2,..., n). 


(43) 


(44) 
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Proof: First, assume (43) holds. According to Theorem 12(b), for any X, L and Xj £ Si where 
i ^ j, there is 


L(S i) = 1, 


K(X % : Xj) = 1. 


Similarly, for any A* and Xj £ S 2 where i 7 ^ j, there is 


L(S 2 ) = 1 , 


K(X % : Xj) = 1. 


By Definition 4, 


L((Si),(S 2 )) = 1, 


ff(Si|S 2 ) = 0, 
fi"(5 2 |5i) = 0. 


(*!,..., X n ) = tf(Si) + H(S 2 \S!) = H(S\), 
H(X 1 ,...,X n ) = H(S 2 ) + H(S 1 \S 2 ) = H(S 2 ). 


Also, 


L(S 1 ) = 1 , 


H(S 1 ) = H(X il ) = ---=H(X ini ). 
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Similarly, there is 


H(S 2 ) = H(X n ) = --- = H(X Jn2 ). 

Therefore, 

H(X r, • • •, X n ) = H(S\) = H(X h ) = ■■■ = H(X ini ), 
H(X U ...,X n ) = H(S 2 ) = H(X n ) = ■■■ = H(X jn2 ). 

This implies 


H(X !, ..., X n ) = H{X!) = ■■■ = H(X n ). 


Therefore, for any i ^ j, 


H(Xj) < H(Xi, Xj) < H{X r,..., X n ) = H(Xj), 


H{Xj) = H{Xi,Xj), 


H(Xi\Xj) = 0, 


K(X % :X j ) = l. 

Second, assume 

K(Xi,Xj) = 1 (i / j and i, j = 1, 2,..., n). 


For any Xi and Xj £ Sj where i ^ j. there is 


K(Xi : Xj) = 1, 


£(Sr) = 1. 
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Similarly, for any X t and Xj G Sg where i 7 ^ j, there is 


K(Xi :Xj) = 1, 


L(S 2 ) = 1. 


For the case that Xi G Sj and Xj G Sg, 

: Xj) = 1, 

K(Xj :Xi) = 1. 

ff(X i |X i ) = 0, 

= 0. 

Therefore, 

ff(5i|5 2 ) < H(X h \S 2 ) + ■■■ + H{X ini |5 2 ) 
<H(X il \X j ) + -.. + H(X ini \X j ) 
= 0 +-h 0 = 0. 

H(Si\S 2 ) = 0. 

Similarly, there is 


H(S 2 \S 1 ) = 0, 


K(S 1 : S 2 ) = 1, 
K(S 2 : 5i) = 1. 


L((S 2 ),(A 1 )) = 1. 
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This is the proof of Theorem 14(a). 


Also, Theorem 14(a) can be expressed in terms of the T-bivariate association coefficient. 


Theorem 14(b) 


L((Si), (5 2 )) = 1 L(S i) = l L(S 2 ) = 1 


(45) 

(46) 


if and only if 


L(Xi,Xj) = 1 (* < j and i,j = 1,2,... ,ra). (47) 

Proof: This is obvious from Theorem 5. 

As a normalized measure, the T-multivariate association coefficient L(X i,..., X n ) measures 
the degree of the association among the random variables X\..... X n . If the random variables 
Ai,..., X n are considered as a system, the T-multivariate association coefficient measures the 
degree of the association for this system. L(X i,..., X n ) = 0 means that this system is in the 
independent state, while L(X i,..., X n ) = 1 means that this system is in a state in which the 
whole system is equivalent to any one of the variables in this system in terms of information. 
With the L -multivariate association coefficient, the different systems of the random variables are 
comparable to each other in terms of their associations. Therefore, these systems can be classified 
according to their associations. 


3 Discussion 

Traditionally, the concept of the correlation coefficient among the random variables is 
the one that measures the relation between two (groups of) random variables (Agresti, 2002; 
Altharn, 1970; Burg & Lewis, 1988; Goodman Sz Kruskal, 1954; Haberman, 1980; Theil, 1970). 
For a system S of multiple random variables in (38), the association among these random 
variables X\,... , X n includes the relations between the different individual variables, the relations 
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between the different subgroups or subsystems of the random variables, and the interactions 
of the different combinations of these variables. If the association of the system is zero, that 
is, there is no association among the random variables in this system, then this system is 
actually in the independent state. The mathematical expression for a system of independent state is 


p(x i, ...,x n ) = p(x i) x • • • x p(x n ), 


or 

L(X i,...,X n ) = 0. 


On the other hand, the system could be reduced to a single variable in terms of information. 
That is, any individual variable could totally determine any other individual variable and 
also totally depend on any other individual variable. In this case, the system is said to be 
in the state of perfect association. The mathematical expression for a system of perfect association is 


L(X l ,X j ) = 1 (i <j and i,j = 1,2,... ,n), 
or 

L(Xi,... ,X n ) = 1. 


The general case for the system S should be between the perfect association and independent 
state with the mathematical expression of 


0 < L(X l, ..., X n ) < 1. 


If the system is in perfect association, the information carried by the multiple variables 
X \,..., X n is nothing but the information carried by any individual variable among X \,..., X n . 
That is, without losing any information, only an individual variable among X \,... ,X n is needed 
instead of keeping all of these variables. Therefore, the T-multivariate association coefficient can 
be used for variables reduction. Given a threshold value, say 0.95, all of the variables are classified 
into several groups in which the T-multivariate association coefficient is larger than the given 
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threshold value, and then one variable is selected from each group. The selected variables will 
carry the most information of the original variables. 

If the system is in an independent state, no information is shared among the variables in this 
system, that is, there is no relation among the variables in the system. Actually the system can be 
split into n independent subsystems in which each subsystem contains one variable. In practice, if 
the T-multivariate association coefficient is fairly small, say less than 0.01, the several groups of 
variables in the system can be split into the several corresponding subsystems. 

It is also noted that the association measured by the T-multivariate association coefficient 
is a general type of association, not any specific type of a linear or nonlinear association. The 
L-multivariate association coefficient fully utilizes the information carried by the variables 
Xi ,..., X n to measure the association among these variables. This is the basic difference between 
the L-multivariate association coefficient and linear correlation. 

In numerical analysis, the T-multivariate association coefficient can be directly calculated 
from the joint probability distribution p(x i,..., x n ). In practice, this joint probability distribution 
may not be estimable in the case of small sample size. Small sample size does not mean a small 
amount of information (large sample size does not mean a large amount of information). How to 
fully utilize the information in a data set to estimate the T-multivariate association coefficient is a 
topic for the future study. 
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